Session B1

Performance Prediction and Optimization

Conference
1:30 PM — 2:30 PM HKT
Local
Dec 2 Wed, 12:30 AM — 1:30 AM EST

Real-Time Scheduling and Analysis of OpenMP Programs with Spin Locks

He Du, Xu Jiang, Tao Yang, Mingsong Lv and Wang Yi

0
Locking protocol is an essential component in resource management of real-time systems, which coordinates mutually exclusive accesses to shared resources from different tasks. OpenMP is a promising framework for multi-core realtime embedded systems as well as provides spin locks to protect shared resources. In this paper, we propose a resource model for analyzing OpenMP programs with spin locks. Based on our resource model, we also develop a technique for analyzing the blocking time which impacts the total workload. Notably, the resource model provides detailed resource access behavior of the programs, making our blocking analysis more accurate. Further, we derive the schedulability analysis for real-time OpenMP tasks with spin locks protecting shared resources. Experiments with realistic OpenMP programs are conducted to evaluate the performance of our method.

Predicting Performance Degradation on Adaptive Cache Replacement Policy

Yi Zhang, Ran Cui, Mingsong Lv, Chuanwen Li and Qingxu Deng

0
Adaptive Cache Replacement Policy (ACRP) has been implemented in recently proposed commercial multi-core processors. ACRP consists of two candidate cache replacement policies and dynamically employs the policy which is with fewer cache misses at the moment. ACRP can diminish the overall cache misses, but at the same time it augments the performance inference between co-running applications and makes the performance prediction much harder. Unfortunately, very little work has focused on the performance impact from this mechanism. In this paper, we firstly expose the performance variation problem due to adaptive cache replacement policies. Secondly, we present Bubble-Bound, a low-overhead measurement-based method to estimate a program��s performance variation caused by the dynamic adaptation of cache replacement policies. By using a stress program to characterize the pressure and sensitivity, our method can predict a bound for the performance degradation between co-located applications and enable ��safe�� co-locations on the processors with ACRP.

Making Inconsistent Components More Efficient For Hybrid B+Trees

Xiongxiong She, Chengliang Wang and Fenghua Tu

1
The emergence of non-volatile memories (NVMs) provides opportunities for efficiently manipulating and storing tree-based indexes. Traditional tree structures fail to take full advantage of NVMs due to the considerable shifts of entries. Those redundant write activities induce severe performance collapse and power consumption, which is unacceptable for embedded systems. Advanced schemes, such as NV-Tree, remain only leaf nodes in NVM to lighten the burden of writes. However, NV-Tree suffers from frequency and in-DRAM structures. In this paper, we develop a novel tree structure, Marionette- Tree, to address these issues. We follow the hybrid layout of non-leaf/leaf nodes and adopt bitmap-based leaf nodes for minimum NVM writes. We aggregate the pointers of internal nodes into a shadow array, allowing recording more keys in an internal node. We also design a delayed-split-migration scheme to minimize the management overhead of the shadow array. Extensive evaluations demonstrate that Marionette-Tree can achieve 1.38 �� and 5.73 �� insertion speedup over two stateof- the-arts.

Session Chair

Nan Guan (The Hong Kong Polytechnic University)

Session B2

Edge and Persistent Memory

Conference
1:30 PM — 2:30 PM HKT
Local
Dec 2 Wed, 12:30 AM — 1:30 AM EST

XOR-Net: An Efficient Computation Pipeline for Binary Neural Network Inference on Edge Devices

Shien Zhu, Luan H. K. Duong, and Weichen Liu

1
Accelerating the inference of Convolution Neural Networks (CNNs) on edge devices is essential due to the small memory size and poor computation capability of these devices. Network quantization methods such as XNOR-Net, Bi-Real- Net, and XNOR-Net++ reduce the memory usage of CNNs by binarizing the CNNs. They also simplify the multiplication operations to bit-wise operations and obtain good speedup on edge devices. However, there are hidden redundancies in the computation pipeline of these methods, constraining the speedup of those binarized CNNs.
In this paper, we propose XOR-Net as an optimized computation pipeline for binary networks both without and with scaling factors. As XNOR is realized by two instructions XOR and NOT on CPU/GPU platforms, XOR-Net avoids NOT operations by using XOR instead of XNOR, thus reduces bit-wise operations in both aforementioned kinds of binary convolution layers. For the binary convolution with scaling factors, our XOR-Net further rearranges the computation sequence of calculating and multiplying the scaling factors to reduce fullprecision operations. Theoretical analysis shows that XOR-Net reduces one-third of the bit-wise operations compared with traditional binary convolution, and up to 40% of the fullprecision operations compared with XNOR-Net. Experimental results show that our XOR-Net binary convolution without scaling factors achieves up to 135�� speedup and consumes no more than 0.8% energy compared with parallel full-precision convolution. For the binary convolution with scaling factors, XOR-Net is up to 17% faster and 19% more energy-efficient than XNOR-Net.

Load Balance Awared Data Sharing Systems In Heterogeneous Edge Environment

Sheng Chen, Zheng Chen, Siyuan Gu, Baochao Chen, Junjie Xie and Deke Guo

1
Edge computing has become the de facto method for delay-sensitive applications, in which the computation and storage resources are placed at the edge of network. The main responsibility of edge computing is to carry data from the Cloud downlinks and terminal uplinks, and organize these data well on the edge side. This is the basis for subsequent analysis and processing of data. Therefore, this brings about a question as to how those data should be organized on the edge and how to store and retrieve them. In response to this demand, some methods have been proposed to solve the related problems of how to build data storage and retrieval services on the edge side. Those methods propose three different solutions: structured, unstructured, and hybrid schemes. However, the data storage and retrieval services for the heterogeneous edge environment is still lack of research. It is still not considered an important design test load balancing when the data is stored on the edge side. In this paper, we design and implement w-strategy, a load balance approach that implements the appropriate load balance among the heterogeneous edge nodes by using the weighted Voronoi diagram. Our solution utilizes the software defined networking paradigm to support a virtual-space based distributed hash tables (DHTs) to distribute data. Evaluation results show that wstrategy achieves better load balancing among the heterogeneous edge nodes compared to the existing methods, GRED and Chord. And, the w-strategy improves the average underutilization of the resources by 20%.

Themis: Malicious Wear Detection and Defense for Persistent Memory File Systems

Wenbin Wang, Chaoshu Yang, Runyu Zhang, Shun Nie, Xianzhang Chen and Duo Liu

2
The persistent memory file systems can significantly improve the performance by utilizing the advanced features of emerging Persistent Memories (PMs). Unfortunately, the PMs have the problem of limited write endurance. However, the design of persistent memory file systems usually ignores this problem. Accordingly, the write-intensive applications, especially for the malicious wear attack virus, can damage underlying PMs quickly by calling the common interfaces of persistent memory file systems to write a few cells of PM continuously. Which seriously threat to the data reliability of file systems. However, existing solutions to solve this problem based on persistent memory file systems are not systematic and ignore the unlimited write endurance of DRAM. In this paper, we propose a malicious wear detection and defense mechanism for persistent memory file systems, called Themis, to solve this problem. The proposed Themis identifies the malicious wear attack according to the write traffic and the set lifespan of PM. Then, we design a wear-leveling scheme and migrate the writes of malicious wear attackers into DRAM to improve the lifespan of PMs. We implement the proposed Themis in Linux kernel based on NOVA, a state-of-the-art persistent memory file system. Compared with DWARM, the stateof- the-art and wear-aware memory management technique, experimental results show that Themis can improve 5774�� lifetime of PM and 1.13�� performance, respectively.

Session Chair

Mingsong Lv (Northeastern University)

Made with in Toronto · Privacy Policy · © 2022 Duetone Corp.